Truncation of Content Terms for Turkish

Sever, Hayri and Tonta, Yaşar Truncation of Content Terms for Turkish., 2006 (Unpublished) [Report]

[thumbnail of tonta-sever-cicling2006.pdf]

Preview

PDF
tonta-sever-cicling2006.pdf
Download (272kB) | Preview

English abstract

Stemming, truncating, suffix stripping and decompounding algorithms used in information retrieval (IR) to reduce the content terms to their respective conflated forms are well-known algorithms for their causes for improving the retrieval performance as well as providing space and processing efficiency. In this paper we investigate the statistical characteristics of the truncated terms for Turkish on a text corpus consisting of more than 50 million words and attempt to measure the vocabulary growth rates for both the whole and truncated words. Findings indicate that the truncated words in Turkish exhibit a Zipfian behavior and that the whole words can successfully be truncated to the average word length (6.2 characters) without compromising performance effectiveness. The vocabulary growth rate for truncated words is about one third of that for the whole words. The result of our study is two fold. First it surely opens the room for truncation of content terms for Turkish for which there is no publicly available stemming code equipped with morphological analysis capability. Second, use of a truncation algorithm for indexing Turkish text may yield comparable effectiveness values with that of a stemming algorithm and hence, the need for stemming may become absolote, given that morphological analyzers for Turkish is highly complex in nature.

Item type:	Report
Keywords:	Stemming algorithms, truncation, Turkish language, information retrieval, indexing
Subjects:	L. Information technology and library technology > LM. Automatic text retrieval. I. Information treatment for information services > ID. Knowledge representation.
Depositing user:	prof. yasar tonta
Date deposited:	05 May 2007
Last modified:	02 Oct 2014 12:07
URI:	http://hdl.handle.net/10760/9494

Check full metadata for this record

References

Downloads

Downloads per month over past year

Actions (login required)

View Item

Facebook

Twitter

RSS