| 作成者 |
|
|
|
|
|
|
|
| 本文言語 |
|
| 出版者 |
|
|
|
| 発行日 |
|
| 収録物名 |
|
| 巻 |
|
| 出版タイプ |
|
| アクセス権 |
|
| 関連DOI |
|
|
|
| 関連URI |
|
|
|
| 関連情報 |
|
|
|
| 概要 |
We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (i.e. how different it... is from others) of substring equivalence classes within a given set of strings. A document is then classified as spam if it contains a characteristic equivalence class as a substring. The proposed method is unsupervised, independent of language, and is very efficient. Computational experiments conducted on data collected from Japanese web forums show fairly good results.続きを見る
|