Analysis of errors in the automatic translation of questions for translingual QA systems

Olvera-Lobo, María-Dolores and García-Santiago, Lola Analysis of errors in the automatic translation of questions for translingual QA systems. Journal of Documentation, 2010, vol. 66, n. 3, pp. 434-455. [Journal article (Paginated)]

[thumbnail of 2010-JoD-Publicación_definitiva.pdf] PDF
2010-JoD-Publicación_definitiva.pdf

Download (231kB)

English abstract

Purpose – This study aims to focus on the evaluation of systems for the automatic translation of questions destined to translingual question-answer (QA) systems. The efficacy of online translators when performing as tools in QA systems is analysed using a collection of documents in the Spanish language. Design/methodology/approach – Automatic translation is evaluated in terms of the functionality of actual translations produced by three online translators (Google Translator, Promt Translator, and Worldlingo) by means of objective and subjective evaluation measures, and the typology of errors produced was identified. For this purpose, a comparative study of the quality of the translation of factual questions of the CLEF collection of queries was carried out, from German and French to Spanish. Findings – It was observed that the rates of error for the three systems evaluated here are greater in the translations pertaining to the language pair German-Spanish. Promt was identified as the most reliable translator of the three (on average) for the two linguistic combinations evaluated. However, for the Spanish-German pair, a good assessment of the Google online translator was obtained as well. Most errors (46.38 percent) tended to be of a lexical nature, followed by those due to a poor translation of the interrogative particle of the query (31.16 percent). Originality/value – The evaluation methodology applied focuses above all on the finality of the translation. That is, does the resulting question serve as effective input into a translingual QA system? Thus, instead of searching for “perfection”, the functionality of the question and its capacity to lead one to an adequate response are appraised. The results obtained contribute to the development of improved translingual QA systems.

Item type: Journal article (Paginated)
Keywords: Translation services, Computer applications, Knowledge management, Languages,traducción, Lenguajes, conocimiento Error analysis, Quality improvement
Subjects: L. Information technology and library technology
Depositing user: Maria Dolores/ M.D. Olvera Lobo
Date deposited: 15 Nov 2010
Last modified: 02 Oct 2014 12:17
URI: http://hdl.handle.net/10760/15092

References

Abusalah, M. et al. (2005), “Literature Review of Cross Language Information Retrieval”, Proceedings of World Academy of Science, Engineering and Technology, Vol. 4, pp. 175-177.

Airio E. (2008), “Who benefits from CLIR in web retrieval?”, Journal of Documentation, Vol. 64 No. 5, pp. 760-778.

Banerjee, S. and Lavie, A. (2005), “METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments”, Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan available at: http://www.aclweb.org/anthology-new/P/P05/P05-3019.pdf (accessed 15 May 2008)

Callison-Burch, C. et al. (2006), “Re-evaluating the Role of BLEU in Machine Translation Research”, EACL 2006: 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, pp. 249-256, available at: http://www.aclweb.org/anthology-new/E/E06/E06-1032.pdf (accessed 15 May 2008)

Cui, H. et al. (2004), “A Comparative Study on Sentence Retrieval for Definitional Question Answering”, SIGIR Workshop on Information Retrieval for Question Answering (IR4QA), Sheffield, U.K.

Doddington, G. (2002), “Automatic evaluation of machine translation quality using n-gram co-occurrence statistics”, Proceedings of the Second International Conference on Human Language Technology Research, San Diego, California, pp. 128-132, available at: http://www.nist.gov/speech/tests/mt/doc/ngram-study.pdf (accessed 15 June 2008)

García Cumbreras, M. Á., et al. (2005), “Búsqueda de respuestas multilingüe: clasificación de preguntas en español basadas en aprendizaje”, Procesamiento del lenguaje natural, Vol. 34, pp. 31-40, available at: http://www.sepln.org/revistaSEPLN/revista/34/03.pdf. (accessed 14 May 2008).

Green, A. et al. (1961), “Baseball: An Automatic Question Answerer”, Proceedings of the Western Joint Computer Conference, Vol. 19, pp. 219–224.

Hermjakob, U. (2001), “Parsing and Question Classification for Question Answering”, Annual Meeting of the ACL: Proceedings of the Workshop on Open-Domain Question Answering. Toulouse, France, Vol. 12, pp.1-6.

Hull, D. A. and Grefenstette, G. (1996), “Querying across languages: A dictionary-based approach to multilingual information retrieval”, Proceedings of the 19th International Conference on Research and Development in Information Retrieval, pp. 49-57. available at: http://doi.acm.org/10.1145/243199.243212 (accessed 3 June 2008).

Hansen, P. and Karlgren, J. (2005), “Effects of foreign language and task scenario on relevance assessment”, Journal of Documentation, Vol. 61 No. 5, pp. 623-638.

Jones, G. J. F. et al. (1999), “A comparison of query translation methods for English-Japanese cross-language information retrieval (poster abstract)”, Annual ACM Conference on Research and Development in Information Retrieval archive, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, pp. 269 – 270.

Jones, G. J. F. et al. (2008), “Domain-Specific Query Translation for Multilingual Information Access using Machine Translation Augmented With Dictionaries Mined from Wikipedia”, The 2nd International Workshop on “Cross-Lingual Information Access” Addressing the Information Need of Multilingual Societies.

Kishida, K. (2005), “Technical issues of cross-language information retrieval: a review”, Information Processing & Management, Vol. 41 No. 3, pp. 433–455.

Kwok, C. et al. (2001), “Scaling Question Answering to the Web”. ACM Transactions on Information Systems, Vol. 19 No. 3, pp. 242-262.

Larosa, S. et al. (2005), “Best Translation for an Italian-Spanish Question Answering System”, Proceedings Information and Communication Technologies International Symposium, ICTIS’2005, Marroc. ????

Leusch, G., et al. (2003), “A Novel String-to-String Distance Measure with Applications to Machine Translation Evaluation”, Proceedings of MT Summit IX, New Orleans, USA, available at: http://www.amtaweb.org/summit/MTSummit/FinalPapers/35-Leusch-final.pdf (accessed 3 May 2008).

Levenshtein, V. I. (1966), Binary codes capable of correcting deletions, insertions and reversals, Soviet Physics Doklady, Vol. 10, pp. 707-710.

López-Ostenero, F. et al. (2004), “Búsqueda de información multilingüe: estado del arte”. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial. Vol. 8, No. 22, pp. 11-35.

Melamed, D. et al. (2003), “Precision and recall of machine translation”. Proceedings of the HLT-NAACL, Edmonton, Canada.

Nießen, S. et al. (2000), “An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research”, Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, Greece, pp. 39–45.

Oard, D. W. and Diekema, A. (1998), “Cross-language information retrieval”. Annual Review of Information Science and Technology, Vol. 33, pp. 223–256.

Oard, D.W. et al. (2004), “Interactive cross-language document selection”, Information Retrieval, Vol. 7 No. 1,2, pp. 205-228.

Oard, D. W. et al. (2008). “User-assisted query translation for interactive cross-language information retrieval”. Information Processing & Management, Vol. 44 No. 1, pp. 181-211.

Papineni, K. et al (2002), “BLEU: a Method for Automatic Evaluation of Machine Translation”, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, Pennsylvania. pp. 311-318, available at: http://www.aclweb.org/anthology-new/P/P02/P02-1040.pdf (accessed 3 May 2008).

Pérez, A. et al. (2004), “Traducción automática mediante transductores estocásticos de estados finitos basados en gramáticas k-explorables”. Actas de las III Jornadas Techabla, Valencia, Spain, pp. 207-212.

Snover, M. et al. (2005), “Study of translation error rate with targeted human annotation”. Machine Translation Workshop, North Bethesda, MD, NIST.

Sokolova, S. (2007), How the Computer Translates, available at: http://www.promt.com/company/technology/pdf/e_how_computer_translates_sokolova.pdf (accessed 12 June 2008).

Tillman, C. et al. (1997), “Accelerated DP based search for statistical translation”, Proceedings of the 5th European Conference on Speech Communication and Technology, Rhodes, Greece, pp. 2667–2670.

Tomás, J. (2003), “A Quantitative Method for Machine Translation Evaluation”, Proceedings of the of EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing, Budapest, Hungary, available at: http://www.aclweb.org/anthology-new/W/W03/W03-2804.pdf (accessed 10 May 2008).

Vicedo, J. L. (2004), “La Búsqueda de Respuestas: Estado Actual y Perspectivas de Futuro”, Inteligencia Artificial: Revista Iberoamericana de Inteligencia Artificial, Vol. 8 No. 22, pp. 37-56.

Vidal, E. (1997), “Finite-State Speech-to-Speech Translation”, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Munich, Germany.

Vilar, D. et al. (2006), “Error analysis of statistical machine translation output”, Proceedings of the Fifth Int. Conf. on Language Resources and Evaluation (LREC), Genova, Italy, May, pp. 697-702, available at: http://www.tc-star.org/pubblicazioni/scientific_publications/RWTH/marzo2006/fullPaper.pdf (accessed 9 July 2008).

Vossen, P. (Ed.) (1998). “Introduction to EuroWordNet” in Ide, N., Greenstein, D., Vossen, P. (Eds), Special Issue on EuroWordNet. Computers and the Humanities, Vol. 32, Nos. 2-3, pp. 73-89

Volk, M. et al. (2003), “Ontologies in Cross-Language Information Retrieval”, in Proceedings of 2nd Conference on Professional Knowledge Management, Lucerne, Switzerland, available at: http://www.ifi.uzh.ch/arvo/cl/volk/papers/WOW_Lucerne_2003.pdf. (accessed 15 January 2009).

Voorhees E.M. (1999). “The TREC 8 Question Answering Track Report”, in Proceedings of the 8th Text REtrieval Conference, available at: http://trec.nist.gov/pubs/trec8/papers/qa_report.pdf (accessed 17 November 2008).

Warren, D. (1981), “Efficient Processing of Interactive Relational Database Queries Expressed in Logic” in Proceedings Seventh International Conference on Very Large Data Bases, Cannes, France, VLDB Endowment, Vol. 7, pp. 272-283.

Weizenbaum J. (1966), “Eliza: A computer program for the study of natural language communication between man and machine”, Communications of the ACM. Vol. 9, No. 1, pp. 36-45.

Woods, W. et al. (1972), “The Lunar Sciences Natural Language Information System”, BBN Final Report 2378, Bolt, Beranek and Newman, Cambridge, UK


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item