Spam Filtering by Quantitative Profiles
Instead of the \'bag-of-words\' representation, in the quantitative profile approach to spam filtering and email categorization, an email is represented by an m-dimensional vector of numbers, with m fixed in advance. Inspired by email shape analysis proposed recently by Sroufe et al., two instances of quantitative profiles are considered: line profile and character profile. Performance of these profiles is studied on the TREC 2007, CEAS 2008 and a private corpuses. At low computational costs, the two quantitative profiles achieve performance that is at least comparable to that of heuristic rules and naive Bayes.
Keywords: Email Categorization, Spam Filtering, Quantitative Profile, Character Profile, Line Profile, Random Forest
Download Full-Text
ABOUT THE AUTHORS
Marian Grendar
Slovanet, a.s., Zahradnicka 151, 821 08 Bratislava, Slovakia
Jana Skutova
Slovanet, a.s., Zahradnicka 151, 821 08 Bratislava, Slovakia
Vladimir Spitalsky
Slovanet, a.s., Zahradnicka 151, 821 08 Bratislava, Slovakia
Marian Grendar
Slovanet, a.s., Zahradnicka 151, 821 08 Bratislava, Slovakia
Jana Skutova
Slovanet, a.s., Zahradnicka 151, 821 08 Bratislava, Slovakia
Vladimir Spitalsky
Slovanet, a.s., Zahradnicka 151, 821 08 Bratislava, Slovakia