[PLUG] Backscatter problem...

Terry Griffin griffint at pobox.com
Mon Sep 7 02:12:55 UTC 2009


plug_1 at robinson-west.com wrote:
> I use Postfix 2.2.10 and I wonder what common practice is to stop
backscatter email?  I see that body checks and header checks are
recommended.
>
> I turned off grey listing and have since been looking to improve my spam
prevention without it.  So far, I have studied the sorbs black lists and
I've added a few more.  I was originally using just
> dnsbl.sorbs.net where I have added the spam list and all the rhsbl
lists.  I am also using just bl.spamcop.net and I have been trying to
produce my own dns black list.  I figure if I add more black
> lists I will stop more spam from getting onto my mail servers in the
first place.  The trick is figuring out which ones to add.
>

I had a substantial backscatter problem last Spring. I ended up writing
a bunch of custom python code to filter it. The algorithm went something
like this:

1. Determine if the incoming messages is some sort of bounce (possibly
legit). This is determined by two things:
a. The "From:" address (typically postmaster or mailer-daemon).
b. Some or all of an original message is included either in-line
or as a MIME attachment.

If it is a bounce then:

2. If the original message is included in its entirety, then run the
original through your regular spam filters. If it turns out to be spam
then the bounce is backscatter.

(This next step requires that have a name for each email address
on your system, i.e. that jsmith at example.com is "Joe Smith". Where
you don't have that information you'll have to skip this step.)

3. Look for the "To:" line in the header of the original message.
Parse out the email address and the name. Check to see of the name
is a possible match for the email address. Typically in bounced
spam you'll see a complete mismatch, i.e.:

    From: "Wendy" <jsmith at example.com>

There's no way "Wendy" could be a match for "Joe Smith", so the bounce
is likely backscatter. I used a fuzzy string comparison so that slight
variations and typos of "Joe Smith" would count as a match (a legit
bounce). Python's difflib.get_close_matches() works well for this.

This algorithm ended up catching well over 90% of the backscatter. I
eventually reversed steps 2 and 3 to reduce CPU load. Step 3 is fairly
cheap CPU-wise and very effective, so if it detects backscatter you can
skip CPU-intensive step #2.

Terry





More information about the PLUG mailing list