Creating digital language resources

Nenadić, Goran Creating digital language resources. Pregled nacionalnog centra za digitalizaciju, 2004, n. 5, pp. 191-30. [Journal article (Paginated)]

[img]
Preview
PDF
2.pdf

Download (238kB) | Preview

English abstract

We discuss building digital language resources (such as annotated corpora lexicons, ontologies, terminologies, tools), which are the main prerequisite for successful communication and information management in the e-society of the 21st century. We give an overview of the main requirements and best practices, and point to necessary steps for creation and maintenance of standards based and reusable language resources for written language. The notion of basic and extended language resource kits are discussed, along with other international initiatives, including the Declaration on open access to language resources. We also analyze challenges and responsibilities in creating digital language resources, and identify the need for wider national and international coordination and cooperation

Item type: Journal article (Paginated)
Keywords: language resources, digitization, human language technologies, lexica, corpora, terminologies, open access
Subjects: H. Information sources, supports, channels. > HZ. None of these, but in this section.
Depositing user: Biljana Kosanovic
Date deposited: 04 Jul 2008
Last modified: 02 Oct 2014 12:12
URI: http://hdl.handle.net/10760/11860

References

T. Berners-Lee, J. Hendler, O. Lassila, 2001: The Semantic Web, Scientific American (May 2001)

C. Cucchiarini, E. d'Halleweyn, 2002: How to HLT-Enable a Language: The Dutch-Flemish Experience, http://www.hltcentral.org/page-996.0.shtml

ENABLER Declaration Committee, 2003: Declaration on Open Access to Language Resources, Paris, August 2003

T. Erjavec, A. Lawson, L. Romary, (Eds.), 1998: East Meet West: A Compendium of Multilingual Resources. TELRI-MULTEXT EAST CD-ROM, 1998

T. Erjavec, C. Krstev, V. Petkević, K. Simov, M. Tadic, D. Vitas, 2003: The MULTEXT-East Morphosyntactic Specifications for Slavic Languages, in Proc. of the EACL 2003 Workshop on Morphological Processing of Slavic Languages, Budapest

T. Erjavec, 2004: MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora, in Proc. of LREC 2004, pp. 1535–1538

European Advisory Group for Language Engineering Standards (EAGLES), information available at http://www.ilc.cnr.it/EAGLES96/home.html

European Language Resources Association (ELRA), information available at: http://www.elra.info/

European National Activities for Basic Language Resources (ENABLER), information available at: http://www.enabler-network.org/

C. Fellbaum, (Ed.), 1998: WordNet – an Electronic Lexical Database, MIT Press, 1998

N. Ide, J. Veronis, 1994: MULTEXT (multilingual tools and corpora), in Proceedings of the COLING 1994, Kyoto.

C. Krstev, G. Pavlović-Lažetić, I. Obradović, D. Vitas, 2003: Corpora Issues in Validation of Serbian WordNet, in: V. Matousek, et al. (Eds): Text, Speech and Dialogue, TSD 2003, LNAI 2807, pp. 132–137.

G. Nenadić, D. Vitas, 1998: Formal Model of Noun Phrases in Serbo-Croatian, in BULAG 23, Figement et T.A.L., 1998, pp. 297–311

G. Nenadić, D. Vitas, 1998: Using Local Grammars for Agreement Modelling in Highly Inflective Languages, in Proc. of TSD'98, Masaryk University, pp. 97–102

G. Nenadić, 2000: Local Grammars and Parsing Coordination of Nouns in Serbo-Croatian, in P. Sojka et al. (Eds.): Text, Speech and Dialogue (TSD 2000), Lecture Notes in Artificial Intelligence, Vol. 1902, Springer Verlag, pp. 57–62

G. Nenadić, I. Spasic, 2000: Recognition and Acquisition of Compound Names from Corpora, in: D. Christodoulakis, (Ed.): Natural Language Processing (NLP 2000), LNAI 1835, Springer-Verlag, 2000, pp. 38–48

G. Nenadić, D. Vitas, C. Krstev, 2001: Local grammars and Compound Verb Lemmatization in Serbo-Croatian, in: G. Zybatow, et al (Eds.): Current Issues in Formal Slavic Linguistics, Frankfurt/Main: Peter Lang, pp. 469–477

G. Nenadić, I. Spasic, S. Ananiadou, 2003: Morpho-syntactic Clues for Terminological Processing of Serbian, in: Proc. of the EACL 2003 Workshop on Morphological Processing of Slavic Languages, Budapest

G. Nenadić, I. Spasic, S. Ananiadou, 2003: Reducing Lexical Ambiguity in Serbo-Croatian by Using Genetic Algorithms, in: P. Kosta, et al. (Eds.): Investigations into Formal Slavic Linguistics, Linguistik International, Peter Lang, Frankfurt, 2003

G. Pavlović-Lažetić, D. Vitas, C. Krstev, 2003: Dictionary of toponyms in Serbian, in Proceedings of Sixth INTEX Workshop, Sofia, Bulgaria

S. Stamou, G. Nenadić, D. Christodoulakis, 2004: Exploring Balkanet Shared Ontology for Multilingual Conceptual Indexing, in Proc. of LREC 2004, 781–784

D. Sullivan, 2001: Document Warehousing and Text Mining, Techniques for Improving Business Operations, Marketing and Sales, Wiley

D. Vitas, 993. Mathematical Model of Serbo-Croatian Morphology (Nominal Inflection), PhD thesis, Faculty of Mathematics, Belgrade

D. Vitas, C. Krstev, G. Pavlović-Lažetić, G. Nenadić, 1998: Recent Results in Serbian Computational Lexicography, in Bokan, N. (Ed.): Contemporary Mathematics, the Monograph on the 125th anniversary of the Faculty of Mathematics, University of Belgrade, pp. 111–128

D. Vitas, C. Krstev, G. Pavlović-Lažetić, 2001: The Flexible Entry, in: G. Zybatow, et al. (Eds): Current Issues in Formal Slavic Linguistics, Frankfurt/Main, pp. 461–468

D. Vitas, C. Krstev, 2003: Composite Tense Recognition and Tagging in Serbian, in Proc. of the EACL 2003 Workshop on Morphological Processing of Slavic Languages, Budapest

D. Vitas, C. Krstev, I. Obradović, Lj. Popović, G. Pavlović-Lažetić, G., 2003: An Overview of Resources and Basic Tools for the Processing of Serbian Written Texts, in Proc. of Workshop on Balkan Language Resources, 1st Balkan Conference in Informatics, Greece


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item