Mersinli, Ümit Associative measures and multi-word unit extraction in Turkish. Dil ve Edebiyat Dergisi, 2015, vol. 12, n. 1, pp. 43-61. [Journal article (Paginated)]
Text
umit_mersinli_turkish_mwe.pdf - Published version Download (365kB) |
English abstract
Associative measures are “mathematical formulas determining the strength of association between two or more words based on their occurrences and cooccurrences in a text corpus” (Pecina, 2010, p. 138). The purpose of this paper is to test the 12 associative measures that Text-NSP (Banerjee & Pedersen, 2003) contains on a 10-million-word subcorpus of Turkish National Corpus (TNC) (Aksan et.al., 2012). A statistical comparison of those measures is out of the scope of the study, and the measures will be evaluated according to the linguistic relevance of the rankings they provide. The focus of the study is basically on optimizing the corpus data, before applying the measures and then, evaluating the rankings produced by these measures as a whole, not on the linguistic relevance of individual n-grams. The findings include intra-linguistically relevant associative measures for a comma delimited, sentence splitted, lower-cased, well-balanced, representative, 10-million-word corpus of Turkish.
Item type: | Journal article (Paginated) |
---|---|
Keywords: | Multi-word units, associative measures, Turkish National Corpus |
Subjects: | L. Information technology and library technology > LL. Automated language processing. |
Depositing user: | Ümit Mersinli |
Date deposited: | 15 Jul 2015 12:08 |
Last modified: | 15 Jul 2015 12:08 |
URI: | http://hdl.handle.net/10760/25489 |
Downloads
Downloads per month over past year
Actions (login required)
View Item |