Evaluating Google Neural Machine Translation from Chinese to English: Technical vs. Literary Texts

Zhongming Zhang; Syed Nurulakla Syed Abdullah; Muhammad Alif Redzuan Abdullah; Wenqi Duan

doi:10.17576/gema-2025-2503-09

Evaluating Google Neural Machine Translation from Chinese to English: Technical vs. Literary Texts

Zhongming Zhang, Syed Nurulakla Syed Abdullah, Muhammad Alif Redzuan Abdullah, Wenqi Duan

Abstract

As the global need for translation increases, machine translation (MT) has significantly enhanced the efficiency in facilitating information dissemination and cross-cultural communication. However, its quality remains bound by intrinsic limitations among language pairs and text genres. These discrepancies lead to distinct MT performance when processing technical and literary texts, forming the core gap and focus. This study aims to compare the quality of Google Neural Machine Translation (GNMT) in literary and technical texts, investigating error disparities and establishing the abilities and limits of MT across diverse linguistic contexts. The research was concerned with the English-Chinese language pair with the Multidimensional Quality Metrics (MQM) framework for manual annotation. The COMET automatic evaluation metric was also applied for validation and confirmation of quality differences observed. This study selected five excerpts from Apple product manuals (33 aligned units) and the novel, the Old Man and Sea (32 aligned units), respectively. Findings included (1) GNMT performed well with technical texts, but acted less effective with literary texts and technical texts exhibited notable terminology errors, whereas literary texts showed more stylistic inconsistencies; (2) MQM scores demonstrated a remarkable difference, with technical texts outperforming literary texts by 18.57%; and (3) COMET evaluation validated the above observations, confirming a significant difference between the two text styles. Although GNMT faced challenges with both texts, the quality remained acceptable within this study. Results recommend improving GNMT algorithms to enhance accuracy and remedy error patterns and distributions.

Keywords

Google Neural Machine Translation (GNMT); Translation Quality Evaluation; Technical and Literary Texts; Multidimensional Quality Metrics (MQM); COMET Metric

Full Text:

PDF

References

Alenezi, A. M. (2024). Error analysis of neural machine translation in technical texts: Google Translate as a case study. Journal of the North for Humanities, 9(2, Part 1), 167–181. https://doi.org/10.12816/0061799

Alzain, E., Nagi, K. A., & Algobaei, F. (2024). The Quality of Google Translate and ChatGPT English to Arabic Translation: The Case of Scientific Text Translation. In Forum for Linguistic Studies (Vol. 6, No. 4, pp. 837-849). http://dx.doi.org/10.30564/fls.v6i3.6799

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

Baker, M. (2011). Corpus linguistics and translation studies—implications and applications. In Text and technology: In honour of John Sinclair (pp. 233-250). John Benjamins Publishing Company.

Cai, L. (2024). How does ChatGPT Compare with Conventional Neural Machine Translation Systems in Performing a Chinese to English Translation Task?. Journal of Translation Studies, 4(1), 25-45. http://dx.doi.org/10.3726/JTS012024.02

Chéragui, M. A. (2012). Theoretical Overview of Machine translation. ICWIT, 160-169.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1), 37-46.

Dunder, I., Seljan, S., & Pavlovski, M. (2021). What Makes Machine-Translated Poetry Look Bad? A Human Error Classification Analysis. In Central European conference on information and intelligent systems (pp. 183-191). Faculty of Organization and Informatics Varazdin.

Fakih, A., Ghassemiazghandi, M., Fakih, A. H., & Singh, M. K. (2024). Evaluation of Instagram’s Neural Machine Translation for Literary Texts: An MQM-Based Analysis. GEMA Online Journal of Language Studies, 24(1). http://dx.doi.org/10.17576/gema-2024-2401-13

Fang, Q. A Comparative Analysis on Wu Lao’s and Yu Guangzhong’s Chinese Versions of The Old Man and the Sea. Journal of Innovation and Social Science Research, 9(9), 504–507.

Freitag, M., Foster, G., Grangier, D., Ratnakar, V., Tan, Q., & Macherey, W. (2021). Experts, errors, and context: A large-scale study of human evaluation for machine translation. Transactions of the Association for Computational Linguistics,9, 1460-1474. http://dx.doi.org/10.1162/tacl_a_00437

Guerberof-Arenas, A., & Toral, A. (2022). Creativity in translation: Machine translation as a constraint for literary texts. Translation Spaces, 11(2), 184-212. http://dx.doi.org/10.1075/ts.21025.gue

He, L., Ghassemiazghandi, M., & Subramaniam, I. (2024). Comparative assessment of Bing Translator and Youdao Machine Translation Systems in English-to-Chinese literary text translation. In Forum for Linguistic Studies (Transferred) (Vol. 6, No. 2, pp. 1189-1189).

http://dx.doi.org/10.59400/fls.v6i2.1189

Hu, K., & Li, X. (2023). The creativity and limitations of AI neural machine translation: A corpus-based study of DeepL’s English-to-Chinese translation of Shakespeare’s plays. Babel, 69(4), 546-563. http://dx.doi.org/10.1075/babel.00331.hu

Hutchins, W. J. (1986). Machine translation: past, present, future (p. 66). Chichester: Ellis Horwood.

J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33, 159–174

Ji, B., Duan, X., Zhang, Y., Wu, K., & Zhang, M. (2024). Zero-shot prompting for llm-based machine translation using in-domain target sentences. IEEE/ACM Transactions on Audio, Speech, and Language Processing. http://dx.doi.org/10.1109/TASLP.2024.3519814

Jinfang, Y., Kasuma, S. A., & Moindjie, M. A. (2025). Translator’s Style in Fiction Translation: A Ten-Year Systematic Literature Review. Journal of Language Teaching and Research, 16(1), 125-133. http://dx.doi.org/10.17507/jltr.1601.14

Koehn, P. Neural Machine Translation. Cambridge University Press: Cambridge, UK, 2020.

Kostikova, I., Shevchenko, A., Holubnycha, L., Popova, N., & Budianska, V. (2019). Use of machine translation technology for understanding scientific and technical texts. Journal of Theoretical and Applied Information Technology, 97(4), 1350-1361.

Kuzman, T., Vintar, Š., & Arcan, M. (2019, August). Neural machine translation of literary texts from English to Slovene. In Proceedings of the qualities of literary machine translation (pp. 1-9).

Liu, J. (2020). Comparing and analyzing cohesive devices of SMT and NMT from Chinese to English: a diachronic approach. Open Journal of Modern Linguistics, 10(06),765. http://dx.doi.org/10.4236/ojml.2020.106046

Liu, M., Zhang, H., & Wu, G. (2021). Fine grained human evaluation for English-to-Chinese machine translation: A case study on scientific text. arXiv preprint arXiv:2110.14766.

Lommel, A. (2013). Multidimensional quality metrics: a flexible system for assessing translation quality. In Proceedings of Translating and the Computer 35.

Lommel, A., Uszkoreit, H., & Burchardt, A. (2014). Multidimensional quality metrics (MQM): A framework for declaring and describing translation quality metrics. Tradumàtica, 12, 455-463.

Long, X., Chen, K., Bamigbade, O. O., & Swenson, D. L. (2023, September). In-Depth Analysis of Machine Translation and Human Translation of Literary Book Chinese Traditional Culture and a Community with a Shared Future for Mankind. In 3rd International Conference on Internet, Education and

Information Technology (IEIT 2023) (pp. 1163-1170). Atlantis Press. http://dx.doi.org/10.2991/978-94-6463-230-9_139

Lu, Y. (2023, July). An Analysis of Error Types in Chinese to English Translation by Google Neural Machine Translation. In Proceedings of the 2023 International Joint Conference on Robotics and Artificial Intelligence (pp. 148-154).

Lyu, C., Du, Z., Xu, J., Duan, Y., Wu, M., Lynn, T., ... & Wang, L. (2023). A paradigm shift: The future of machine translation lies with large language models. arXiv preprint arXiv:2305.01181.

McIntosh, T. R., Susnjak, T., Arachchilage, N., Liu, T., Xu, D., Watters, P., & Halgamuge, M. N. (2025). Inadequacies of large language model benchmarks in the era of generative artificial intelligence. IEEE Transactions on Artificial Intelligence. http://dx.doi.org/10.1109/TAI.2025.3569516

Maxmudjanovna, Y. N., & Xamidjanovna, A. N. (2021). Technical translation as a type of specialized translation. Central Asian Journal of Literature, Philosophy and Culture.

Mohsen, M. (2024). Artificial intelligence in academic translation: A comparative study of large language models and google translate. PSYCHOLINGUISTICS, 35(2), 134-156. http://dx.doi.org/10.31470/2309-1797-2024-35-2-134-156

Naveen, P., & Trojovský, P. (2024). Overview and challenges of machine translation for contextually appropriate translations. iScience, 27(10), 110878. https://doi.org/10.1016/j.isci.2024.110878

Ng, Y. L. E. (2009). A Systemic Approach to Translating Style: A Comparative Study of Four Chinese Translations of Hemingway’s The Old Man and the Sea. (Doctoral dissertation, University College London). UCL Discovery.

Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318).

Peng, Z., & Yvon, F. (2023). Document-level Machine Translation for Scientific Texts (Doctoral dissertation, ISIR, Université Pierre et Marie Curie UMR CNRS 7222).

Ploeger, E., Lai, H., Van Noord, R., & Toral, A. (2024). Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation. arXiv preprint arXiv:2408.17308.

Rei, R., Stewart, C., Farinha, A. C., & Lavie, A. (2020). COMET: A neural framework for MT evaluation. arXiv preprint arXiv:2009.09025. http://dx.doi.org/10.18653/v1/2020.emnlp-main.213

Shahnazaryan, L., & Beloucif, M. (2024). Defining Boundaries: The Impact of Domain Specification on Cross-Language and Cross-Domain Transfer in Machine Translation. arXiv preprint arXiv:2408.11926.

Siu, S. C. (2023). ChatGPT and GPT-4 for Professional Translators: Exploring the Potential of Large Language Models in Translation. Available at SSRN 4448091. http://dx.doi.org/10.2139/ssrn.4448091

Stewart, C., Rei, R., Farinha, C., & Lavie, A. (2020, October). COMET-Deploying a New State-of-the-art MT Evaluation Metric in Production. In AMTA (2) (pp. 78-109).

Tahseen, W., & Hussein, S. H. (2024). Investigating Machine translation errors in rendering English literary texts into Arabic. Integrated Journal for Research in Arts and Humanities, 4(1), 68-81. http://dx.doi.org/10.55544/ijrah.4.1.11

Tan, Z., Wang, S., Yang, Z., Chen, G., Huang, X., Sun, M., & Liu, Y. (2020). Neural machine translation: A review of methods, resources, and tools. AI Open, 1, 5-21. http://dx.doi.org/10.1016/j.aiopen.2020.11.001

Toral, Antonio, Andreas Van Cranenburgh, and Tia Nutters. “Literary-adapted machine translation in a well-resourced language pair: Explorations with More Data and Wider Contexts.” Computer-Assisted Literary Translation. Routledge, 2023. 27-52. http://dx.doi.org/10.4324/9781003357391-3

Ulitkin, I., Filippova, I., Ivanova, N., & Poroykov, A. (2021). Automatic evaluation of the quality of machine translation of a scientific text: the results of a five-year-long experiment. In E3S Web of Conferences (Vol. 284, p. 08001). EDP Sciences. http://dx.doi.org/10.1051/e3sconf/202128408001

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Wang, X., & Wang, T. (2019). A comparative study of human translation and machine translation post-editing in EC Translation: Translation speed, quality and translators’ attitude. Foreign Languages and Cultures, 3(4), 83-93.

Way, A., Youdale, R., & Rothwell, A. (2023). Why more literary translators should embrace translation technology. Revista Tradumática, 21, 87-102. https://doi.org/10.5565/rev/tradumatica.344

Weaver, W. (1952). Translation. In Proceedings of the Conference on Mechanical Translation.

Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J.R., Riesa, J., Rudnick, A., Vinyals,

O., Corrado, G.S., Hughes, M., & Dean, J. (2016). Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. ArXiv, abs/1609.08144.

Xie, Y. (2008). Hemingway’s Language Style and Writing Techniques in “The Old Man and the Sea”. English language teaching, 1(2), 156-158. http://dx.doi.org/10.5539/elt.v1n2p156

Ying, C., Shuyu, Y., Jing, L., Lin, D., & Qi, Q. (2021). Errors of machine translation of terminology in the patent text from English into Chinese. ASP Transactions on Computers, 1(1), 12-17.

Zhang, B., Haddow, B., & Birch, A. (2023, July). Prompting large language model for machine translation: A case study. In International Conference on Machine Learning (pp. 41092-41110). PMLR.

Zhao, Y, Zhang, H &Yang, Y. (2024). Comparative Study on the Translation Quality of Large Language Models—Taking the Translation of “Fan Hua” as an Example. Technology Enhanced Foreign Language Education, 4(109), 60-66.

DOI: http://dx.doi.org/10.17576/gema-2025-2503-09