<テクニカルレポート>
Unsupervised Spam Detection based on String Alienness Measures

作成者
本文言語
出版者
発行日
雑誌名
出版タイプ
アクセス権
概要 We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (i.e. how different it... is from others) of substring equivalence classes within a given set of strings. A document is then classified as spam if it contains a characteristic equivalence class as a substring. The proposed method is unsupervised, independent of language, and is very efficient. Computational experiments conducted on data collected from Japanese web forums show fairly good results.続きを見る

本文情報を非表示

trcs229 pdf 477 KB 54  

詳細

レコードID
査読有無
関連情報
主題
タイプ
登録日 2009.04.22
更新日 2017.01.24