Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition) ›› 2021, Vol. 39 ›› Issue (2): 13-20.doi: 10.16088/j.issn.1001-6600.2020082602

Previous Articles     Next Articles

Dynamic Learning Method of Neural Machine Translation Based on Sample Difficulty

WANG Su1,2, FAN Yixing1,2 , GUO Jiafeng1,2* , ZHANG Ruqing1,2 , CHENG Xueqi1,2   

  1. 1. Key Laboratory of Network Data Science & Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-08-26 Revised:2020-09-22 Online:2021-03-25 Published:2021-04-15

Abstract: In recent years, neural machine translation model has become the mainstream model in the field of machine translation. How to learn translation knowledge quickly and accurately from a large amount of training data is a problem worthy of discussion. Different training samples have different degrees of difficulty. Some training samples are simpler and easy for model to learn, while others are more difficult and not easy for model to learn. The difficulty of the samples has a great influence on the convergence of the model, but the traditional neural machine translation model does not consider this difference in the training process. Therefore, this paper explores the influence of the difficulty of the samples on the training process of the neural machine translation model. Considering the sample difficulty for the neural machine translation mode, a dynamic learning method is proposed based on the idea of “curriculum learning”. The difficulty degree of the training samples is quantified from the aspects of the translation effect of the neural machine translation model and the sentence length of the training samples, respectively, then, two learning strategies are designed from-easy-to-difficult and from-difficult-to-easy to train the model. Finally, the translation effects of the model are compared. The experimental results show that both from-easy-to-difficult and from-difficult-to-easy dynamic learning methods can improve the translation effect of the neural machine translation model.

Key words: neural machine translation, curriculum learning, sample difficulty, dynamic learning

CLC Number: 

  • TP391
[1] 叶绍林.基于注意力机制编解码框架的神经机器翻译方法研究[D].合肥:中国科学技术大学,2019.
[2]WANG R,UTIYAMA M,SUMITA E.Dynamic sentence sampling for efficient training of neural machine translation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).Stroudsburg,PA:Association for Computational Linguistics,2018:298-304.
[3]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.New York,NY:Association for Computing Machinery,2009:41-48.
[4]KALCHBRENNER N,BLUNSOM P.Recurrent continuous translation models[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2013:1700-1709.
[5]CHO K,Van MERRIËNBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2014:1724-1734.
[6]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems:Volume 2.Cambridge,MA:MIT Press,2014:3104-3112.
[7]BAHDANAU D,CHO K H,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].(2016-05-19)[2020-08-26].https://arxiv.org/pdf/1409.0473.pdf.
[8]WU Y H,SCHUSTER M,CHEN Z F,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation[EB/OL].(2016-10-08)[2020-08-26].https://arxiv.org/pdf/1609.08144.pdf.
[9]GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[J].Proceedings of Machine Learning Research,2017,70:1243-1252.
[10]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2017:6000-6010.
[11]TSVETKOV Y,FARUQUI M,LING W,et al.Learning the curriculum with bayesian optimization for task-specific word representation learning[C]//Proceedings of the 54th Annual Meeting on Association for Computational Linguistics(Volume 1:Long Papers).Stroudsburg,PA:Association for Computational Linguistics,2016:130-139.DOI:10.18653/v1/P16-1013.
[12]CIRIK V,HOVY E,MORENCY L P.Visualizing and understanding curriculum learning for long short-term memory networks[EB/OL].(2016-11-18)[2020-08-26].https://arxiv.org/pdf/1611.06204.pdf.
[13]KOCMI T,BOJAR O.Curriculum learning and minibatch bucketing in neural machine translation[EB/OL].(2017-07-29)[2020-08-26].https://arxiv.org/pdf/1707.09533v1.pdf.
[14]ZHANG X,KUMAR G,KHAYRALLAH H,et al.An empirical exploration of curriculum learning for neural machine translation[EB/OL].(2018-11-02)[2020-08-26].https://arxiv.org/pdf/1811.00739.pdf.
[15]KUDO T,RICHARDSON J.SentencePiece:A simple and language independent subword tokenizer and detokenizer for neural text processing[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing:System Demonstrations.Stroudsburg,PA:Association for Computational Linguistics,2018:66-71.DOI:10.18653/v1/D18-2012.
[16]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2002:311-318.DOI:10.3115/1073083.1073135.
[1] YANG Zhou, FAN Yixing, ZHU Xiaofei, GUO Jiafeng, WANG Yue. Survey on Modeling Factors of Neural Information Retrieval Model [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2021, 39(2): 1-12.
[2] ZHUO Ming, LIU Leyuan, ZHOU Shijie, YANG Peng, WAN Simin. A New Method for Invulnerability Analysis of Spatial Information Networks [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2021, 39(2): 21-31.
[3] DENG Wenxuan, YANG Hang, JIN Ting. A Dimensionality-reduction Method Based on Attention Mechanismon Image Classification [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2021, 39(2): 32-40.
[4] XU Qingting, ZHANG Lanfang, ZHU Xinhua. An Automatic Scoring Method for Subjective Questions Using Semantic Technologies and LSTM [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2021, 39(2): 51-61.
[5] XU Gang, LIU Hai-yan, ZHANG Chao-ying, LIANG Zhen-yan. Simulations of Constructivism Application Based on Cellular Automata [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(4): 7-12.
[6] MA Xian-bing, SUN Shui-fa, QIN Yin-shi, GUO Qing, XIA Ping. Object Tracking Algorithm of On-line Boosting Based on Particle Filter [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 100-105.
[7] SUN Shui-fa, LI Le-peng, DONG Fang-min, ZOU Yao-bin, CHEN Peng. Bi-histogram Equalization Based on Partially Overlapped Sub-block with Iterative Threshold [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 119-126.
[8] MA Yuan-yuan, LÜ Kang, XU Jiu-cheng. Image Retrieval of Multi-level Similarity Based on Granular Computing [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 127-131.
[9] HUANG Zhi-min, WANG Dong-li, WEN Ying, LÜ Yue. Offline Writer Recognition Based on Modified Grid Feature [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 132-137.
[10] WANG Feng, JIN Xiao-bo, YU Jun-wei, WANG Gui-cai. V-optimal Histogram and Its Application in License Plate Classification [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 138-143.
[11] YANG Jun-yao, MENG Zu-qiang. Path Planning Based on Time-dependent Logistics Networks Model [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 152-156.
[12] LIU Jun, BU Zhao-hui, IKEDA Takashi, MATSUMOTO Tadahiro. Chinese Translation of Japanese Polysemous Verb for Machine Translation Based on Semantic Combination ——Take “KIRERU”“TAOSU” as an Example [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2013, 31(3): 177-183.
[13] CHEN Feng,MENG Zuqiang. Topic Discovery in Microblog Based on BTM and Weighting K-Means [J]. Journal of Guangxi Teachers Education University (Philosophy and Social Sciences Edition), 2019, 37(3): 71-78.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!