Analysis of errors in the automatic translation of questions for translingual QA systems

Olvera-Lobo, María Dolores and García Santiago, María Dolores Analysis of errors in the automatic translation of questions for translingual QA systems. Journal of Documentation, 2010, vol. 66, n. 3, pp. 434-455. [Journal article (Paginated)]

[img]
Preview
Text
Analysis of errors in the automatic translation of questions for translingual QA systems.pdf

Download (269kB) | Preview

English abstract

Purpose – This study aims to focus on the evaluation of systems for the automatic translation of questions destined to translingual question‐answer (QA) systems. The efficacy of online translators when performing as tools in QA systems is analysed using a collection of documents in the Spanish language. Design/methodology/approach – Automatic translation is evaluated in terms of the functionality of actual translations produced by three online translators (Google Translator, Promt Translator, and Worldlingo) by means of objective and subjective evaluation measures, and the typology of errors produced was identified. For this purpose, a comparative study of the quality of the translation of factual questions of the CLEF collection of queries was carried out, from German and French to Spanish. Findings – It was observed that the rates of error for the three systems evaluated here are greater in the translations pertaining to the language pair German‐Spanish. Promt was identified as the most reliable translator of the three (on average) for the two linguistic combinations evaluated. However, for the Spanish‐German pair, a good assessment of the Google online translator was obtained as well. Most errors (46.38 percent) tended to be of a lexical nature, followed by those due to a poor translation of the interrogative particle of the query (31.16 percent). Originality/value – The evaluation methodology applied focuses above all on the finality of the translation. That is, does the resulting question serve as effective input into a translingual QA system? Thus, instead of searching for “perfection”, the functionality of the question and its capacity to lead one to an adequate response are appraised. The results obtained contribute to the development of improved translingual QA systems.

Item type: Journal article (Paginated)
Keywords: Translation services, Computer applications, Knowledge management, Languages,Error analysis, Quality improvement
Subjects: L. Information technology and library technology
Depositing user: Maria Dolores/ M.D. Olvera Lobo
Date deposited: 23 Jul 2018 11:44
Last modified: 23 Jul 2018 11:44
URI: http://hdl.handle.net/10760/32869

References

Abusalah, M., Tait, J. and Oakes, M. (2005), “Literature review of cross-language information retrieval”, Proceedings of World Academy of Science, Engineering and Technology, Vol. 4, pp. 175-7.

Airio, E. (2008), “Who benefits from CLIR in web retrieval?”, Journal of Documentation, Vol. 64 No. 5, pp. 760-78.

Banerjee, S. and Lavie, A. (2005), “METEOR: an automatic metric for MT evaluation with improved correlation with human judgments”, Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, MI.

Callison-Burch, C., Osborne, M. and Koehn, P. (2006), “Re-evaluating the role of BLEU in machine translation research”, Proceedings of the EACL 2006: 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, pp. 249-56.

Cui, H., Kan, M.-Y., Chua, T.-S. and Xiao, J. (2004), “A comparative study on sentence retrieval for definitional question answering”, Proceedings of the SIGIR Workshop on Information Retrieval for Question Answering (IR4QA), Sheffield, 29 July.

Doddington, G. (2002), “Automatic evaluation of machine translation quality using n-gram co-occurrence statistics”, Proceedings of the 2nd International Conference on Human Language Technology Research, San Diego, CA, pp. 128-32.

Garc´ıa Cumbreras, M.A´ ., A´ ngel, M. and Santiago, M. (2005), “Bu´squeda de respuestas multilingu¨ e: clasificacio´n de preguntas en espan˜ol basadas en aprendizaje”, Procesamiento del lenguaje natural, Vol. 34, pp. 31-40.

Green, A., Wolf, A.K., Chomsky, C. and Laughery, K. (1961), “Baseball: an automatic question answerer”, Proceedings of the Western Joint Computer Conference, Los Angeles, CA, pp. 219-24.

Hansen, P. and Karlgren, J. (2005), “Effects of foreign language and task scenario on relevance assessment”, Journal of Documentation, Vol. 61 No. 5, pp. 623-38.

Hermjakob, U. (2001), “Parsing and question classification for question answering”, Annual Meeting of the ACL: Proceedings of the Workshop on Open-Domain Question Answering. Toulouse, France, pp. 1-6.

Hull, D.A. and Grefenstette, G. (1996), “Querying across languages: a dictionary-based approach to multilingual information retrieval”, Proceedings of the 19th International Conference on Research and Development in Information Retrieval, Zurich, pp. 49-57.

Jones, G.J.F., Fantino, F., Newman, E. and Zhang, Y. (2008), “Domain-specific query translation for multilingual information access using machine translation augmented with dictionaries mined from Wikipedia”, Proceedings of the 2nd International Workshop on Cross-Lingual Information Access Addressing the Information Need of Multilingual Societies, Chicago, IL, July.

Jones, G.J.F., Sakai, T., Collier, N., Kumano, A. and Sumita, K. (1999), “A comparison of query translation methods for English-Japanese cross-language information retrieval (poster abstract)”, Annual ACM Conference on Research and Development in Information Retrieval Archive, Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, pp. 269-70.

Kishida, K. (2005), “Technical issues of cross-language information retrieval: a review”,

Information Processing & Management, Vol. 41 No. 3, pp. 433-55.

Kwok, C., Etzioni, O. and Weld, D.S. (2001), “Scaling question answering to the web”, ACM Transactions on Information Systems, Vol. 19 No. 3, pp. 242-62.

Larosa, S. (2005), “Best translation for an Italian-Spanish question-answering system”, Proceedings Information and Communication Technologies International Symposium, ICTIS’2005, Tetuan, Morocco, June 3-6.

Leusch, G., Ueffing, N. and Ney, H. (2003), “A novel string-to-string distance measure with applications to machine translation evaluation”, Proceedings of MT Summit IX, New Orleans, LA.

Levenshtein, V.I. (1966), “Binary codes capable of correcting deletions, insertions and reversals”,

Soviet Physics Doklady, Vol. 10, pp. 707-10.

Lo´pez-Ostenero, F., Gonzalo, J. and Verdejo, F. (2004), “Bu´squeda de informacio´n multilingu¨ e: estado del arte”, Revista Iberoamericana de Inteligencia Artificial, Vol. 8 No. 22, pp. 11-35.

Melamed, D., Green, R. and Turian, J. (2003), “Precision and recall of machine translation”,

Proceedings of the HLT-NAACL, Edmonton.

Nießen, S., Och, F.J., Leusch, G. and Ney, H. (2000), “An evaluation tool for machine translation: fast evaluation for MT research”, Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, pp. 39-45.

Oard, D.W. and Diekema, A. (1998), “Cross-language information retrieval”, Annual Review of Information Science and Technology, Vol. 33, pp. 223-56.

Oard, D.W., He, D. and Wang, J. (2008), “User-assisted query translation for interactive cross-language information retrieval”, Information Processing & Management, Vol. 44 No. 1, pp. 181-211.

Oard, D.W., Gonzalo, J., Sanderson, M., Lopez-Ostenero, F. and Wang, J. (2004), “Interactive cross-language document selection”, Information Retrieval, Vol. 7 No. 1 & 2, pp. 205-28.

Papineni, K., Roukos, S., Ward, T. and Zhu, W.J. (2002), “BLEU: a method for automatic evaluation of machine translation”, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, pp. 311-18.

Pe´rez, A., Gonza´lez, J., Casacuberta, F. and Torres, I. (2004), “Traduccio´n automa´tica mediante transductores estoca´sticos de estados finitos basados en grama´ticas k-explorables”, Actas de las III Jornadas Techabla, Valencia, pp. 207-12.

Snover, M., Dorr, B., Schwartz, R., Micciulla, L. and Makhoul, J. (2005), “Study of translation error rate with targeted human annotation”, paper presented at Machine Translation Workshop, NIST, North Bethesda, MD.

Sokolova, S. (2007), How the Computer Translates, available at: www.promt.com/company/ technology/pdf/e_how_computer_translates_sokolova.pdf (accessed 12 June 2008).

Tillman, C., Vogel, S., Ney, H., Sawaf, H. and Zubiaga, A. (1997), “Accelerated DP based search for statistical translation”, Proceedings of the 5th European Conference on Speech Communication and Technology, Rhodes, Greece, pp. 2667-70.

Toma´s, J. (2003), “A quantitative method for machine translation evaluation”, Proceedings of the of EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing, Budapest.

Vicedo, J.L. (2004), “La Bu´squeda de Respuestas: Estado Actual y Perspectivas de Futuro”, Inteligencia Artificial: Revista Iberoamericana de Inteligencia Artificial, Vol. 8 No. 22, pp. 37-56.

Vidal, E. (1997), “Finite-state speech-to-speech translation”, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Munich.

Vilar, D., Xu, J., D’Haro, L.F. and Ney, N. (2006), “Error analysis of statistical machine translation output”, Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), Genova, Italy, May, pp. 697-702.

Volk, M., Vintar, S. and Buitelaar, P. (2003), “Ontologies in cross-language information retrieval”,

Proceedings of 2nd Conference on Professional Knowledge Management, Lucerne.

Voorhees, E.M. (1999), “The TREC 8 question-answering track report”, Proceedings of the 8th Text REtrieval Conference, College Park, MD, November.

Vossen, P. (Ed.) (1998), “Introduction to EuroWordNet”, Computers and the Humanities, Vol. 32 Nos 2-3, pp. 73-89.

Warren, D. (1981), “Efficient processing of interactive relational database queries expressed in logic”, Proceedings of the 7th International Conference on Very Large Databases, Cannes, pp. 272-83.

Weizenbaum, J. (1966), “ELIZA: a computer program for the study of natural language communication between man and machine”, Communications of the ACM, Vol. 9 No. 1, pp. 36-45.

Woods, W., Kaplan, R.M. and Nash-Webber, B. (1972), The Lunar Sciences Natural Language Information System, BBN Final Report 2378, Bolt, Beranek and Newman, Cambridge, MA.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item