Spam Filtering with SpamAssassin™

Note: This document describes how to set up automatic e-mail filtering using tags added to your incoming e-mail by SpamAssassin™, a software package installed on the CIS Unix mail handling machines. This is our primary suggested method for spam filtering. It's a reasonably effective method for decreasing the hassle of dealing with spam, and is very easy to set up.

General Filtering Guidelines

People often ask for a method to handle unwanted e-mail (aka ``spam'') sent to their CIS Unix accounts. Spam is a waste of time and computing resources, and people who send spam (aka ``spammers'') are often sleazy types peddling shoddy products and services, often of dubious legality.

Worse, high volumes of spam can make it difficult to deal with your non-spam mail. (Following convention, we'll call your non-spam mail ``ham'' in the following.)

As a matter of policy, CIS Unix users are considered to be responsible for making their own decisions on mail they do and don't want to read. CIS doesn't want to be the business of reading your incoming messages and deciding whether you would find them uninteresting or offensive. But we do want to give you the tools that minimize the hassle of dealing with such messages.

The procmail program on the CIS Unix systems is used to deliver mail to CIS Unix users. Its normal operation is to put all incoming messages in your default INBOX. But you can also have a .procmailrc file in your home directory that will alter this operation based on the content of the incoming messages. This process is called ``filtering;'' the .procmailrc file contains one or more ``rules'' to automatically divert certain messages to other mailboxes, forward them to other addresses, or delete them entirely.

No automatic process can (yet) substitute for your personal judgment on whether any given message is spam or not. Inevitably, even carefully designed filters will have ``false positives'' (messages that the filter thinks look like spam, but aren't), and ``false negatives'' (messages that the filter thinks don't look like spam, but are).

So please note: CIS will not be responsible for incorrect classification of incoming messages, either false positives or false negatives.

For this reason, this method doesn't throw away suspicious mail without giving you a chance to read it (although an easy change will do that). Instead, suspicious mail is diverted to mailboxes other than your normal INBOX. The idea is that you zip through other mailboxes at lower priority, with the expectation that the messages are almost certainly all junk.

About SpamAssassin

Incoming e-mail to CIS Unix accounts passes through the SpamAssassinTM mail scanning program. SpamAssassin uses a number of heuristic tests to score each message it sees: the higher the score, the more likely the mail is spam. Special tags are added to each message before it is delivered to your account containing the score and SpamAssassin's guess as to whether the mail is spam or not.

While SpamAssassin does a very good job guessing whether incoming messages are spam or not, you shouldn't forget that a very good guess is still a guess. SpamAssassin will probably classify a relatively small fraction of non-spam messages as spam ("false positives"), which will be delivered to the IN.spam folder. (That's why the method described here doesn't simply throw away such messages. Although a simple change, described below, will let you do that.)

Conversely, SpamAssassin will probably fail to correctly detect a fraction of spam messages ("false negatives"), which will wind up in your INBOX.

Setting Up Filtering (The Easy Way)

If you are a WebMail user, you can start SpamAssassin-based mail filtering by following the instructions here. (You can disable filtering similarly.)

Non-WebMail users can instead choose the appropriate link at this location: https://webmail.unh.edu/cisunix/spamfilter.html. (There's also a link on that page to stop mail filtering, should you decide it's not for you.)

Either method will put default mail-filtering rules in your .procmailrc file as described above. Filtering starts immediately.

You only need read on if (a) you want to incorporate spam filtering more intelligently into your existing procmail rules; (b) you want to adjust the procmail filtering rules to something other than the default; (c) you want to throw away probable spam, instead of directing it to a separate folder; or (d) you just want to know more about what's going on.

The Gory Details

SpamAssassin works by tagging mail messages with addditional "mail headers"; these headers are typically not shown when you're reading your mail. For a probable-spam message, the added headers might look something like this:

    X-MailScanner-SpamCheck: spam, SpamAssassin (score=6.7, required 5,
	    DEAR_SOMETHING, FROM_ALL_NUMS, FROM_AND_TO_SAME, FROM_ENDS_IN_NUMS,
	    NO_REAL_NAME, RESENT_TO)
    X-MailScanner-SpamScore: ssssss

For a probable-nonspam message, the addition might look like this:

    X-MailScanner-SpamCheck: not spam, SpamAssassin (score=-0.8, required 5,
	    RESENT_TO)

SpamAssassin has assigned the first message a score of 6.7, the second a score of -0.8. By default, SpamAssassin considers a score over 5 to reflect a probable spam message. For messages with positive scores, the X-MailScanner-SpamScore header is added followed by a number of s characters representing the (integral) score.

The default .procmailrc rule provided by the web-based setup looks like this:

    :0:
    * ^X-MailScanner-SpamScore: sssss
    mail/IN.spam

This says: if the mail headers contain an X-MailScanner-SpamScore header followed by five (or more) s characters, put the mail into the IN.spam file in your mail directory. (This is the default location for mail folders in Pine and WebMail.)

If you already have a .procmailrc file, the web-based setup places this rule at the end; it will be checked after the ones you had set up previously.

You can also create or modify your .procmailrc file by logging into your CIS Unix account and using an editor (like pico or vi). For example:

    % pico .procmailrc

If you do this, please note that punctuation and spacing are extremely important in ths file; getting it wrong can cause lost mail. With that warning, here are some ways you can alter the default setup:

Client Filtering

The method described in this document is server-side filtering: it filters your mail on our server as it arrives in your Unix account. Most modern mail client programs allow you to filter messages as well; this is client-side filtering. If you use Microsoft Outlook to read your mail, you can configure it to read SpamAssassin's header and perform an appropriate action. Instructions are here. A similar method for recent versions of Eudora is described here.

Our setup of SpamAssassin doesn't allow filtering by current versions of Microsoft Outlook Express, sorry.


Page Maintenance:
Paul A. Sand <pas@unh.edu>
Last modified: 2012-05-07 8:54 AM EDT
[W3C Validator]