<紀要論文>
長い品詞列を文書特徴とした母語話者英文書・非母語話者英文書の判別

作成者
本文言語
出版者
発行日
収録物名
開始ページ
終了ページ
出版タイプ
アクセス権
JaLC DOI
関連DOI
関連URI
関連情報
概要 We propose using long and low-frequency part of speech (POS) strings for document separation between native English documents and non-native English documents. The long POS strings were ignored in pre...vious works because their frequencies in training data are too small to estimate their probabilities. Meanwhile, a research of language identification showed that the long and low-frequency byte strings were useful for language identification among similar languages. There are some similarity between language identification and document separation between native English documents and non-native English documents, for example long POS strings are more peculiar to one class than short ones, though there is a difference between POS and byte. Therefore, we can expect higher accuracy by using long and low-frequency POS strings. Some experiments are described in this paper. These experiments show that the proposed method has higher accuracy than previous ones.続きを見る

本文ファイル

pdf p115 pdf 693 KB 178  

詳細

PISSN
EISSN
NCID
レコードID
査読有無
主題
登録日 2015.06.19
更新日 2020.11.17

この資料を見た人はこんな資料も見ています