<technical report>
Unsupervised Spam Detection based on String Alienness Measures

Creator
Language
Publisher
Date
Source Title
Vol
Publication Type
Access Rights
Related DOI
Related URI
Relation
Abstract We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (i.e. how different it... is from others) of substring equivalence classes within a given set of strings. A document is then classified as spam if it contains a characteristic equivalence class as a substring. The proposed method is unsupervised, independent of language, and is very efficient. Computational experiments conducted on data collected from Japanese web forums show fairly good results.show more

Hide fulltext details.

pdf trcs229 pdf 477 KB 494  

Details

Record ID
Peer-Reviewed
Subject Terms
Type
Created Date 2009.04.22
Modified Date 2017.01.24

People who viewed this item also viewed