ABSTRACT

Text corpora are some of the most vital linguistic resources in the field of natural language processing (NLP). These include newspaper corpora, like Wall Street Journal Corpus [128] or Mainichi Shinbun Corpus [437], conversation corpora, like the BC3 corpus [311] or the CSJ corpus [151], as well as corpora of literature, such as Aozora Bunko [29]. The importance of corpora is widely recognized and numerous corpora have been compiled so far for different languages. However, comparing to major world languages, like English, there are few large corpora available for the Japanese language [206]. Moreover, the grand majority of them are based on newspapers or legal documents. Unfortunately, they are usually unsuitable for the research on emotion processing as emotions are rarely expressed in this kind of texts. Although there exist speech corpora, such as Corpus of Spontaneous Japanese [151], which could become suitable for emotion processing research, due to the difficulties with compilation of such corpora they are relatively small.