Associative measures and multi-word unit extraction in Turkish

Mersinli, Ümit Associative measures and multi-word unit extraction in Turkish. Dil ve Edebiyat Dergisi, 2015, vol. 12, n. 1, pp. 43-61. [Journal article (Paginated)]

[thumbnail of umit_mersinli_turkish_mwe.pdf] Text
umit_mersinli_turkish_mwe.pdf - Published version

Download (365kB)

English abstract

Associative measures are “mathematical formulas determining the strength of association between two or more words based on their occurrences and cooccurrences in a text corpus” (Pecina, 2010, p. 138). The purpose of this paper is to test the 12 associative measures that Text-NSP (Banerjee & Pedersen, 2003) contains on a 10-million-word subcorpus of Turkish National Corpus (TNC) (Aksan et.al., 2012). A statistical comparison of those measures is out of the scope of the study, and the measures will be evaluated according to the linguistic relevance of the rankings they provide. The focus of the study is basically on optimizing the corpus data, before applying the measures and then, evaluating the rankings produced by these measures as a whole, not on the linguistic relevance of individual n-grams. The findings include intra-linguistically relevant associative measures for a comma delimited, sentence splitted, lower-cased, well-balanced, representative, 10-million-word corpus of Turkish.

Item type: Journal article (Paginated)
Keywords: Multi-word units, associative measures, Turkish National Corpus
Subjects: L. Information technology and library technology > LL. Automated language processing.
Depositing user: Ümit Mersinli
Date deposited: 15 Jul 2015 12:08
Last modified: 15 Jul 2015 12:08
URI: http://hdl.handle.net/10760/25489

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item