广西师范大学学报(哲学社会科学版) ›› 2021, Vol. 39 ›› Issue (2): 13-20.doi: 10.16088/j.issn.1001-6600.2020082602

• CCIR2020 • 上一篇    下一篇

基于样本难度的神经机器翻译动态学习方法

王素1,2, 范意兴1,2, 郭嘉丰1,2*, 张儒清1,2, 程学旗1,2   

  1. 1.中国科学院 计算技术研究所 网络数据科学与技术重点实验室, 北京 100190;
    2.中国科学院大学, 北京 100049
  • 收稿日期:2020-08-26 修回日期:2020-09-22 出版日期:2021-03-25 发布日期:2021-04-15
  • 通讯作者: 郭嘉丰(1980—),男,江苏江阴人,中国科学院研究员,博导。E-mail:guojiafeng@ict.ac.cn
  • 基金资助:
    北京智源人工智能研究院(BAAI2019ZD0306);国家自然科学基金(61722211,61872338,61902381);中国科学院青年创新促进会(20144310);国家重点研发计划(2016QY02D0405);联想-中科院联合实验室青年科学家项目;重庆市基础科学与前沿技术研究专项(cstc2017jcjyBX0059);泰山学者工程专项(ts201511082)

Dynamic Learning Method of Neural Machine Translation Based on Sample Difficulty

WANG Su1,2, FAN Yixing1,2 , GUO Jiafeng1,2* , ZHANG Ruqing1,2 , CHENG Xueqi1,2   

  1. 1. Key Laboratory of Network Data Science & Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-08-26 Revised:2020-09-22 Online:2021-03-25 Published:2021-04-15

摘要: 近年来,神经机器翻译模型已经成为机器翻译领域的主流模型,如何从大量的训练数据中快速、准确地学习翻译知识是一个值得探讨的问题。不同训练样本的难易程度不同,样本的难易程度对模型的收敛性有极大影响,但是传统的神经机器翻译模型在训练过程中并没有考虑这种差异性。本文探究样本的难易程度对神经机器翻译模型训练过程的影响,基于“课程学习”的思想,为神经机器翻译模型提出了一种基于样本难度的动态学习方法:分别从神经机器翻译模型的翻译效果和训练样本的句子长度2方面量化训练样本的难易程度;设计了由易到难和由难到易2种学习策略训练模型,并比较模型的翻译效果。

关键词: 神经机器翻译, 课程学习, 样本难度, 动态学习

Abstract: In recent years, neural machine translation model has become the mainstream model in the field of machine translation. How to learn translation knowledge quickly and accurately from a large amount of training data is a problem worthy of discussion. Different training samples have different degrees of difficulty. Some training samples are simpler and easy for model to learn, while others are more difficult and not easy for model to learn. The difficulty of the samples has a great influence on the convergence of the model, but the traditional neural machine translation model does not consider this difference in the training process. Therefore, this paper explores the influence of the difficulty of the samples on the training process of the neural machine translation model. Considering the sample difficulty for the neural machine translation mode, a dynamic learning method is proposed based on the idea of “curriculum learning”. The difficulty degree of the training samples is quantified from the aspects of the translation effect of the neural machine translation model and the sentence length of the training samples, respectively, then, two learning strategies are designed from-easy-to-difficult and from-difficult-to-easy to train the model. Finally, the translation effects of the model are compared. The experimental results show that both from-easy-to-difficult and from-difficult-to-easy dynamic learning methods can improve the translation effect of the neural machine translation model.

Key words: neural machine translation, curriculum learning, sample difficulty, dynamic learning

中图分类号: 

  • TP391
[1] 叶绍林.基于注意力机制编解码框架的神经机器翻译方法研究[D].合肥:中国科学技术大学,2019.
[2]WANG R,UTIYAMA M,SUMITA E.Dynamic sentence sampling for efficient training of neural machine translation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).Stroudsburg,PA:Association for Computational Linguistics,2018:298-304.
[3]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.New York,NY:Association for Computing Machinery,2009:41-48.
[4]KALCHBRENNER N,BLUNSOM P.Recurrent continuous translation models[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2013:1700-1709.
[5]CHO K,Van MERRIËNBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2014:1724-1734.
[6]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems:Volume 2.Cambridge,MA:MIT Press,2014:3104-3112.
[7]BAHDANAU D,CHO K H,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].(2016-05-19)[2020-08-26].https://arxiv.org/pdf/1409.0473.pdf.
[8]WU Y H,SCHUSTER M,CHEN Z F,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation[EB/OL].(2016-10-08)[2020-08-26].https://arxiv.org/pdf/1609.08144.pdf.
[9]GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[J].Proceedings of Machine Learning Research,2017,70:1243-1252.
[10]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2017:6000-6010.
[11]TSVETKOV Y,FARUQUI M,LING W,et al.Learning the curriculum with bayesian optimization for task-specific word representation learning[C]//Proceedings of the 54th Annual Meeting on Association for Computational Linguistics(Volume 1:Long Papers).Stroudsburg,PA:Association for Computational Linguistics,2016:130-139.DOI:10.18653/v1/P16-1013.
[12]CIRIK V,HOVY E,MORENCY L P.Visualizing and understanding curriculum learning for long short-term memory networks[EB/OL].(2016-11-18)[2020-08-26].https://arxiv.org/pdf/1611.06204.pdf.
[13]KOCMI T,BOJAR O.Curriculum learning and minibatch bucketing in neural machine translation[EB/OL].(2017-07-29)[2020-08-26].https://arxiv.org/pdf/1707.09533v1.pdf.
[14]ZHANG X,KUMAR G,KHAYRALLAH H,et al.An empirical exploration of curriculum learning for neural machine translation[EB/OL].(2018-11-02)[2020-08-26].https://arxiv.org/pdf/1811.00739.pdf.
[15]KUDO T,RICHARDSON J.SentencePiece:A simple and language independent subword tokenizer and detokenizer for neural text processing[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing:System Demonstrations.Stroudsburg,PA:Association for Computational Linguistics,2018:66-71.DOI:10.18653/v1/D18-2012.
[16]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2002:311-318.DOI:10.3115/1073083.1073135.
[1] 杨州, 范意兴, 朱小飞, 郭嘉丰, 王越. 神经信息检索模型建模因素综述[J]. 广西师范大学学报(哲学社会科学版), 2021, 39(2): 1-12.
[2] 禚明, 刘乐源, 周世杰, 杨鹏, 万思敏. 一种空间信息网络抗毁分析的新方法[J]. 广西师范大学学报(哲学社会科学版), 2021, 39(2): 21-31.
[3] 邓文轩, 杨航, 靳婷. 基于注意力机制的图像分类降维方法[J]. 广西师范大学学报(哲学社会科学版), 2021, 39(2): 32-40.
[4] 徐庆婷, 张兰芳, 朱新华. 综合语义技术与LSTM神经网络的主观题自适应评卷方法[J]. 广西师范大学学报(哲学社会科学版), 2021, 39(2): 51-61.
[5] 许钢, 刘海燕, 张超英, 梁振燕. 基于元胞自动机的建构主义理论应用模拟[J]. 广西师范大学学报(哲学社会科学版), 2013, 31(4): 7-12.
[6] 马先兵, 孙水发, 覃音诗, 郭青, 夏平. 基于粒子滤波的on-line boosting目标跟踪算法[J]. 广西师范大学学报(哲学社会科学版), 2013, 31(3): 100-105.
[7] 孙水发, 李乐鹏, 董方敏, 邹耀斌, 陈鹏. 基于迭代阈值的子块部分重叠双直方图均衡算法[J]. 广西师范大学学报(哲学社会科学版), 2013, 31(3): 119-126.
[8] 马媛媛, 吕康, 徐久成. 基于粒计算多层次结构相似度的图像检索[J]. 广西师范大学学报(哲学社会科学版), 2013, 31(3): 127-131.
[9] 黄志敏, 王东利, 文颖, 吕岳. 基于改进网格特征的离线笔迹识别[J]. 广西师范大学学报(哲学社会科学版), 2013, 31(3): 132-137.
[10] 王峰, 靳小波, 于俊伟, 王贵财. V-最优直方图及其在车牌分类中的应用研究[J]. 广西师范大学学报(哲学社会科学版), 2013, 31(3): 138-143.
[11] 杨俊瑶, 蒙祖强. 基于时间依赖的物联网络模型的路径规划[J]. 广西师范大学学报(哲学社会科学版), 2013, 31(3): 152-156.
[12] 刘君, 卜朝晖, 池田尚志, 松本忠博. 基于语义组合的日语多义动词的机器汉译考察——以|-切れる-||-倒す-|为例[J]. 广西师范大学学报(哲学社会科学版), 2013, 31(3): 177-183.
[13] 陈凤,蒙祖强. 基于BTM和加权K-Means的微博话题发现[J]. 广西师范大学学报(哲学社会科学版), 2019, 37(3): 71-78.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!
版权所有 © 广西师范大学学报(哲学社会科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: xbgj@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发