A Study on Korean Language Model Based on Long Short-Term Memory (LSTM)

doi:10.22716/sckt.2020.8.1.003

All Issue

2020 Vol.8, Issue 1 Preview Page Next Page

Research Article

A Study on Korean Language Model Based on Long Short-Term Memory (LSTM) Long Short-Term Memory에 기반한 한국어 언어모델 연구: Sunjeong Lee¹
이선정¹; ¹Dept. of Computer Science & Engineering, Incheon National University

¹인천대학교 컴퓨터공학부

31 March 2020. pp. 19-26

PDF

Abstract

본 고에서는 LSTM에 기반을 둔 한국어 언어모델에 관한 연구를 수행하였으며 GloVe를 사용하는 LSTM 기반 언어모델을 제안한다. 먼저 PTB 영어 코퍼스를 이용하여 전통적인 n-gram 방식의 통계적인 언어모델과 LSTM 기반의 언어모델을 비교하였고 그 결과 47.3%의 복잡도가 감소되는 효과를 얻었다. 한국어에 적용 실험을 위해서 기본 토큰 단위로 WPM (word-piece model)을 사용하였으며 통계적인 n-gram 언어모델과 LSTM 언어모델을 비교한다. 또한, LSTM 언어모델을 만들 때 GloVe를 단어 표현 벡터로 사용하는 방법을 제안하여 비교 연구도 수행하였다. 한국어 평가 코퍼스 10만 문장을 이용하여 성능 비교를 한 결과 LSTM 방식을 사용하였을 경우 n-gram 방식보다 28.8%의 복잡도가 감소하였고 GloVe와 같이 사용할 경우 43.4%의 복잡도가 감소되었다. 영어와 한국어 코퍼스의 비교 실험으로 GloVe를 사용하는 LSTM 기반 언어모델의 제안이 우수하다는 것을 입증하였다.

In this paper, we make a comparative study on the language model based on long short-term memory (LSTM) and propose a language model based on LSTM using GloVe as a word representation vector. For this purpose, traditional n-gram statistical language model is compared with LSTM language model using PTB English corpus. The experimental result yields that LSTM language model get the perplexity (ppl) reduction of 47.3% compared with traditional n-gram model. In order to expand this approach to Korean language, we design a language model of which basic unit is word-piece model (WPM). And we also make a comparative study of statistical language model and neural language model. Especially, we propose a LSTM language model using glove vector (GloVe) as a word representation vector. For our study, 100,000 Korean sentences are used as a test set. Our experimental result yields that LSTM language model get the reduction of 28.8% compared with n-gram language model and LSTM with GloVe get the reduction of 43.3%. In conclusion, we show that the proposed language model is good approach as a language model.

Keywords

Language model

Language model based on LSTM

Statistical language model

GloVe

N-gram

References

J. Daniel, and H. James, Martin, Speech and Language Processing, Pearson International Edition, Second Edition, 2009.
Ronald Rosenfeld, “Two Decades of Statistical Language Modeling: Where do We Go From Here?,” IEEE Proceedings of the IEEE, No. 8, pp.1270-1278.10.1109/5.880083
CMU statistical language modeling toolkit (SRILM), http://www.speech.sri.com/pipermail/srilm-user/2003q4/000153.html
Zaremba, W. I Sutskever, O.Vinyals, “Recurrent Neural Network Regularization,” arXiv:1409.2329, 2014, arxiv.org
김양훈외 3인, “LSTM 언어 모델 기반 한국어 문장 생성” , 한국통신학회논문지, 제41권 제5호, pp. 592-601, May, 2016.10.7840/kics.2016.41.5.592
MikeSchuster and Kaisuke Nakajima, “Japanese and Korean Voice Search,” in Proceeding of ICASSP, pp. 5149-5152, 2012.10.1109/ICASSP.2012.6289079
T. Mikolov et al., “Distributed Representation of Words and Phrases and their Compositionality,” In proceedings of NIPS 2013.
J. Pennington, R. Socher, and C. D. Manning,”GloVe:Global Vector for Word Representation,” in Proceeding of EMNLP, pp. 1532-1543, 2014.10.3115/v1/D14-1162
이선정, “통계적 모델에 기반을 둔 언어모델 적응에 대한 연구”, 한국차세대컴퓨팅학회 논문지, 제12권 제6호 , pp. 34-42, 2016.
S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural Computation, Vol. 9, No. 8, pp. 1735-1780, 1997.10.1162/neco.1997.9.8.17359377276
M . Schuster, and K. K. Paliwai, “Bidirectional Recurrent Neural Networks,” IEEE transaction on Signal Processing, Vol. 45, No. 11, pp. 2673-2681, 1997.10.1109/78.650093
M. Marcus et. Al., “Pen Tree Bank Data,”, Linguistic Data Consortium, University of Pennsylvania, 1999.
R. Kneser, and H. Ney, “Improved Backing-Off for n-gram Language Modeling,” in Proceeding of ICASSP, pp. 181-185, 1995.
Stanley F. Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Computer Speech and Language, pp. 359-394, 1999.10.1006/csla.1999.0128

Information

Publisher :The Society of Convergence Knowledge
Publisher(Ko) :융복합지식학회
Journal Title :The Society of Convergence Knowledge Transactions
Journal Title(Ko) :융복합지식학회논문지
Volume : 8
No :1
Pages :19-26
DOI :https://doi.org/10.22716/sckt.2020.8.1.003

[1] J. Daniel, and H. James, Martin, Speech and Language Processing, Pearson International Edition, Second Edition, 2009.

[2] Ronald Rosenfeld, “Two Decades of Statistical Language Modeling: Where do We Go From Here?,” IEEE Proceedings of the IEEE, No. 8, pp.1270-1278.10.1109/5.880083

[3] CMU statistical language modeling toolkit (SRILM), http://www.speech.sri.com/pipermail/srilm-user/2003q4/000153.html

[4] Zaremba, W. I Sutskever, O.Vinyals, “Recurrent Neural Network Regularization,” arXiv:1409.2329, 2014, arxiv.org

[5] 김양훈외 3인, “LSTM 언어 모델 기반 한국어 문장 생성” , 한국통신학회논문지, 제41권 제5호, pp. 592-601, May, 2016.10.7840/kics.2016.41.5.592

[6] MikeSchuster and Kaisuke Nakajima, “Japanese and Korean Voice Search,” in Proceeding of ICASSP, pp. 5149-5152, 2012.10.1109/ICASSP.2012.6289079

[7] T. Mikolov et al., “Distributed Representation of Words and Phrases and their Compositionality,” In proceedings of NIPS 2013.

[8] J. Pennington, R. Socher, and C. D. Manning,”GloVe:Global Vector for Word Representation,” in Proceeding of EMNLP, pp. 1532-1543, 2014.10.3115/v1/D14-1162

[9] 이선정, “통계적 모델에 기반을 둔 언어모델 적응에 대한 연구”, 한국차세대컴퓨팅학회 논문지, 제12권 제6호 , pp. 34-42, 2016.

[10] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural Computation, Vol. 9, No. 8, pp. 1735-1780, 1997.10.1162/neco.1997.9.8.17359377276

[11] M . Schuster, and K. K. Paliwai, “Bidirectional Recurrent Neural Networks,” IEEE transaction on Signal Processing, Vol. 45, No. 11, pp. 2673-2681, 1997.10.1109/78.650093

[12] M. Marcus et. Al., “Pen Tree Bank Data,”, Linguistic Data Consortium, University of Pennsylvania, 1999.

[13] R. Kneser, and H. Ney, “Improved Backing-Off for n-gram Language Modeling,” in Proceeding of ICASSP, pp. 181-185, 1995.

[14] Stanley F. Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Computer Speech and Language, pp. 359-394, 1999.10.1006/csla.1999.0128

The Society of Convergence Knowledge Transactions ISSN:2287-8920(Print) 융복합지식학회논문지

All Issue