長い品詞列を文書特徴とした母語話者英文書・非母語話者英文書の判別 - 九大コレクション

＜紀要論文＞
長い品詞列を文書特徴とした母語話者英文書・非母語話者英文書の判別

作成者	作成者名行野, 顕正 Yukino, Kensei ユキノ, ケンセイ所属機関所属機関名九州大学大学院システム情報科学府知能システム学専攻 : 博士後期課程 Department of Intelligent Systems, Graduate School of Information Science and Electrical Engineering, Kyushu University : Doctoral Program
	作成者名青木, さやか Aoki, Sayaka アオキ, サヤカ所属機関所属機関名九州大学大学院システム情報科学府知能システム学専攻 : 修士課程 Department of Intelligent Systems, Graduate School of Information Science and Electrical Engineering, Kyushu University : Master's Program
	作成者名谷川, 龍司 Tanigawa, Ryuji タニガワ, リュウジ所属機関所属機関名九州大学大学院システム情報科学府知能システム学専攻 : 修士課程 Department of Intelligent Systems, Graduate School of Information Science and Electrical Engineering, Kyushu University : Master's Program
	作成者名冨浦, 洋一 Tomiura, Yoichi トミウラ, ヨウイチ所属機関所属機関名九州大学大学院システム情報科学研究院知能システム学部門 Department of Intelligent Systems, Faculty of Information Science and Electrical Engineering, Kyushu University
本文言語	日本語
出版者	九州大学大学院システム情報科学研究院
出版者	Faculty of Information Science and Electrical Engineering, Kyushu University
発行日	2006-09-26
収録物名	九州大学大学院システム情報科学紀要
巻	11
号	2
開始ページ	115
終了ページ	119
出版タイプ	Version of Record
アクセス権	open access
JaLC DOI	https://doi.org/10.15017/1516865
関連DOI	https://portal.isee.kyushu-u.ac.jp/
関連URI	https://portal.isee.kyushu-u.ac.jp/
関連情報	https://portal.isee.kyushu-u.ac.jp/
概要	We propose using long and low-frequency part of speech (POS) strings for document separation between native English documents and non-native English documents. The long POS strings were ignored in pre...vious works because their frequencies in training data are too small to estimate their probabilities. Meanwhile, a research of language identification showed that the long and low-frequency byte strings were useful for language identification among similar languages. There are some similarity between language identification and document separation between native English documents and non-native English documents, for example long POS strings are more peculiar to one class than short ones, though there is a difference between POS and byte. Therefore, we can expect higher accuracy by using long and low-frequency POS strings. Some experiments are described in this paper. These experiments show that the proposed method has higher accuracy than previous ones.続きを見る

本文ファイル

ファイル	ファイルタイプ	サイズ	閲覧回数	説明
p115	pdf	693 KB	187

詳細

PISSN	1342-3819
EISSN	2188-0891
NCID	AN10569524
レコードID	1516865
査読有無	査読有
主題	Document separation
	文書の判別
	Corpus construction
	コーパス作成
	Native English corpus
	母語話者コーパス
	Non-native English corpus
	非母語話者コーパス
	Low-frequent features
	低頻度の素性
登録日	2015.06.19
更新日	2020.11.17

この情報を出力する

このページのリンク

他の検索サイト

利用統計

＜紀要論文＞ 長い品詞列を文書特徴とした母語話者英文書・非母語話者英文書の判別

本文ファイル

詳細

この資料を見た人はこんな資料も見ています

＜紀要論文＞
長い品詞列を文書特徴とした母語話者英文書・非母語話者英文書の判別