Unsupervised Spam Detection based on String Alienness Measures - 九大コレクション | 九州大学附属図書館

＜テクニカルレポート＞
Unsupervised Spam Detection based on String Alienness Measures

作成者	著者識別子 L002949 作成者名 Narisawa, Kazuyuki 成澤, 和志所属機関所属機関名 Department of Informatics, Kyushu University 九州大学大学院システム情報科学研究院情報理学部門
	著者識別子 K002686 作成者名 Bannai, Hideo 坂内, 英夫所属機関所属機関名 Department of Informatics, Kyushu University 九州大学大学院システム情報科学研究院情報理学部門
	著者識別子 100021267 作成者名 Hatano, Kohei 畑埜, 晃平所属機関所属機関名 Department of Informatics, Kyushu University 九州大学大学院システム情報科学研究院情報理学部門
	著者識別子 K000172 作成者名 Takeda, Masayuki 竹田, 正幸所属機関所属機関名 Department of Informatics, Kyushu University 九州大学大学院システム情報科学研究院情報理学部門
本文言語	英語
出版者	Department of Informatics, Kyushu University
出版者	九州大学大学院システム情報科学研究院情報理学部門
発行日	2007-01
収録物名	DOI Technical Report
巻	229
出版タイプ	Accepted Manuscript
アクセス権	open access
関連DOI	DOI Technical Report \|\| 229 \|\| p1-9
関連DOI	http://www.i.kyushu-u.ac.jp/research/report.html
関連URI	DOI Technical Report \|\| 229 \|\| p1-9
関連URI	http://www.i.kyushu-u.ac.jp/research/report.html
関連情報	DOI Technical Report \|\| 229 \|\| p1-9
関連情報	http://www.i.kyushu-u.ac.jp/research/report.html
概要	We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (i.e. how different it... is from others) of substring equivalence classes within a given set of strings. A document is then classified as spam if it contains a characteristic equivalence class as a substring. The proposed method is unsupervised, independent of language, and is very efficient. Computational experiments conducted on data collected from Japanese web forums show fairly good results.続きを見る

本文ファイル

ファイル	ファイルタイプ	利用条件	サイズ	閲覧回数	説明
trcs229	pdf	なし	477 KB	749

詳細

レコードID	3424
査読有無	査読無
主題	Spam Detection
主題	Equivalence Class
タイプ	テクニカルレポート
登録日	2009.04.22
更新日	2017.01.24