Friday, June 19, 2009

Detecting spam





Detecting spam

People tend to be much less bothered by spam slipping through filters into their mail box (false negatives), than having desired e-mail ("ham") blocked (false positives). Trying to balance false negatives (missed spams) vs false positives (rejecting good e-mail) is critical for a successful anti-spam system. Some systems let individual users have some control over this balance by setting "spam score" limits, etc. Most techniques have both kinds of errors, to varying degrees. So, for example, anti-spam systems may use techniques that have a high false negative rate (miss a lot of spam), in order to reduce the number of false positives (rejecting good e-mail),

Detecting spam based on the content of the e-mail, either by detecting keywords such as "viagra" or by statistical means, is very popular. Such methods can be very accurate when they are correctly tuned to the types of legitimate email that an individual gets, but they can also make mistakes such as detecting the keyword "cialis" in the word "specialist"; see also Internet censorship#"By-catch". The content also doesn't determine whether the email was either unsolicited or bulk, the two key features of spam. So, if a friend sends you a joke that mentions "viagra", content filters can easily mark it as being spam even though it is neither unsolicited nor sent in bulk.

The most popular DNSBLs (DNS Blacklists) are lists of IP addresses of known spammers, open relays, zombie spammers etc.

Spamtraps are often email addresses that were never valid or have been invalid for

a long time that are used to collect spam. An effective spamtrap is not announced and is only found by dictionary attacks or by pulling addresses off hidden webpages. For a spamtrap to remain effective the address must never be given to anyone. Some black lists, such as spamcop, use spamtraps to catch spammers and blacklist them.

Enforcing technical requirements of the Simple Mail Transfer Protocol (SMTP) can be used to block mail coming from systems that are not compliant with the RFC standards. A lot of spammers use poorly written software or are unable to comply with the standards because they do not have legitimate control of the computer sending spam (zombie computer). So by setting restrictions on the mail transfer agent (MTA) a mail administrator can reduce spam significantly, such as by enforcing the correct fall back of Mail eXchange (MX) records in the Domain Name System, or the correct handling of delays (Teergrube).

No comments:

Post a Comment